首页> 外文OA文献 >Optimizing shared cache behavior of chip multiprocessors
【2h】

Optimizing shared cache behavior of chip multiprocessors

机译:优化芯片多处理器的共享缓存行为

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme. Copyright 2009 ACM.
机译:与新兴芯片多处理器(CMP)相关的关键问题之一是片上共享缓存空间的管理。不幸的是,以单处理器为中心的数据局部性优化方案在CMP情况下可能无法很好地工作,因为来自多个内核的数据访问会在共享缓存空间中产生冲突。本文的主要贡献是一种针对编译器的代码重组方案,用于增强CMP中共享数据的局部性。拟议的方案针对许多商业CMP中存在的最后一级共享缓存,它具有两个组件,即分配和调度,调度分别确定分配给每个核心的循环迭代集,而调度确定分配给每个核心的迭代顺序。核心被执行。我们的方案对应用程序代码进行了重组,以便在数据依赖关系允许的范围内,不同的内核同时对共享数据块进行操作。这有助于减少共享数据的重用距离,并提高片上缓存性能。我们通过在两台商用多核计算机上进行仿真和实验,使用Splash-2和Parsec应用程序评估了我们的方法。我们的实验评估表明,当同时使用分配和调度时,所提出的数据局部性优化方案可使共享缓存中的内核间冲突丢失平均降低67%。同样,我们实现的执行时间改进(平均29%)非常接近使用假设方案可以实现的最佳节省。版权所有2009 ACM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号